The Journal of the Acoustical Society of America
● Acoustical Society of America (ASA)
Preprints posted in the last 90 days, ranked by how well they match The Journal of the Acoustical Society of America's content profile, based on 33 papers previously published here. The average preprint has a 0.01% match score for this journal, so anything above that is already an above-average fit.
MacLean, J.; Zhou, M.; Bidelman, G.
Show abstract
Entrainment and predictive coding aid speech perception in both quiet and noisy environments. Isochronous, periodic auditory rhythmic cues facilitate entrainment and temporal expectations which can benefit encoding and perception of target speech. However, most studies using isochronous cues confound periodicity with predictability. To this end, we characterized how systematic changes in the acoustic dimensions of stimulus rate, target phase, periodicity, and predictably of an entraining sound precursor impact the subsequent identification of concurrent speech targets. Target concurrent vowel pairs were preceded by rhythmic woodblock cues which were either periodic-predictable (PP, isochronous rhythm), aperiodic-predictable (AP, accelerating rhythm), or aperiodic-unpredictable (AU, random rhythm). The number of pulses per rhythm was roved to further manipulate predictability. Stimuli also varied in presentation rate (2.5, 4.5, 6.5 Hz) and target speech phase (in-phase, 0{degrees}; out-of-phase, 90{degrees}, 180{degrees}) relative to the preceding entraining rhythm. We also measured participants musical pulse continuation and standardized speech-in-noise perception abilities. We did not observe any effects of stimulus rhythm, rate, or target phase on target speech identification accuracy. However, reaction times were slowest at the nominal speech rate (4.5 Hz) and were most disrupted by out-of-phase presentations following the PP rhythm. Double-vowel task performance was associated with stronger musical pulse continuation abilities, but not speech-in-noise perception. Our results support the notion that entraining rhythmic cues rely on top-down processing but are relatively muted when stimulus predictability is unknown. Additionally, we find that individual differences in musical pulse perception may underlie the benefits of rhythmic cueing on subsequent speech perception.
Neely, S. T.; Harris, S. E.; Hajicek, J. J.; Petersen, E. A.; Shen, Y.
Show abstract
In a loudness-matching paradigm, a reduction in the loudness of sounds with bandwidths less than one-half octave compared to a tone of equal sound pressure level has been observed previously for five-tone complexes at 60 dB SPL centered at 1 kHz. Here, this loudness-reduction phenomenon is explored using band-limited noise across wide ranges of frequency and level. Additionally, these measurements are simulated by a model of loudness judgement based on neural ensemble averaging (NEA), which serves as a proxy for central auditory signal processing. Multi-frequency equal-loudness contours (ELC) were measured for each of the adult participants (N=100) with pure-tone average (PTA) thresholds that ranged from normal to moderate hearing loss using a categorical-loudness-scaling (CLS) paradigm. Presentation level and center frequency of the test stimuli were determined on each trial according to a Bayesian adaptive algorithm, which enabled multi-frequency ELC estimation within about five minutes of testing. Three separate test conditions differed by stimulus type: (1) pure-tone, (2) quarter-octave noise and (3) octave noise. For comparison, loudness judgements for all three stimulus types were also simulated by the NEA model, which comprised a nonlinear, active, time-domain cochlear model with an appended stage of neural spike generation. Mid-bandwidth loudness reduction was observed to be greatest at moderate stimulus levels and frequencies near 1 kHz. This feature was approximated by the NEA model, which suggests involvement of an early stage of the central auditory system in the formation of loudness judgements.
Rotaru, I.; Geirnaert, S.; Heintz, N.; Bertrand, A.; Francart, T.
Show abstract
Selective auditory attention decoding (AAD) enables tracking which of multiple concurrent speakers a listener attends to and is a key building block for neuro-steered hearing devices. While AAD integrated in a closed-loop system with real-time neurofeedback (NFB) is hypothesized to improve decoding through neural adaptation and error-correction behaviour, the short-term behavioral and algorithmic impact of such a bilateral human-machine interaction remains poorly understood. Here we evaluated the effects of NFB on AAD accuracy and user experience in a single-session AAD paradigm with online NFB involving nineteen participants. They performed a selective listening task with enforced attention switches across four conditions: open-loop (OL), closed-loop with auditory gain feedback (CLA), closed-loop with visual feedback (CLV), and a condition with pseudo-auditory gain control (psCLA) decoupled from the participants individual neural activity. AAD was performed online using both subject-specific and subject-independent linear decoders on 5 s sliding windows, followed by Hidden Markov Model post-processing. Online analysis showed comparable decoding performance across all conditions. However, offline posthoc analysis using subject-independent decoders revealed that AAD accuracy in the CLA condition was significantly lower than in the OL baseline. Subjectively, participants reported that CLA was significantly more distracting and required higher switching effort. Crucially, a causal analysis of the psCLA condition found no robust evidence that higher audio gains inherently improve decoding accuracy. Our results demonstrate that within a single-session paradigm with rapidly varying feedback cues, auditory neurofeedback may degrade AAD performance by increasing cognitive load and distraction. These findings suggest that suboptimal feedback can impede rather than facilitate learning. We conclude that more accurate and stable decoders and longitudinal, multi-session training protocols are likely essential prerequisites for achieving beneficial neurofeedback effects in closed-loop auditory attention systems.
Sotero Silva, N.; Kayser, C.
Show abstract
Recent studies describe Eye Movement-related Eardrum Oscillations (EMREOs), low-frequency signals recorded in the ear canal that arise from the tympanic membrane and are triggered by saccadic eye movements. Because EMREOs are thought to arise from motor elements in the peripheral auditory system, we examined how two known modulators of these elements affect the EMREO time course. First, the activity of outer hair cells (OHC) can be suppressed by the medial olivocochlear reflex (MOCR). If OHCs contribute to the generation of EMREOs, activation of this reflex should reduce EMREO amplitude. To test this, we compared EMREO amplitudes elicited by saccades performed in silence and in the presence of contralateral noise. Second, gravitational cues linked to head orientation may influence EMREOs via oculomotor control circuits that possibly modulate middle ear muscles. To test this, we recorded EMREOs while participants made saccades with their head upright (0{degrees} azimuth) and with their head tilted 30{degrees} in either direction. Across both experiments our data reveal no clear modulation of the EMREO time course by these experimental manipulations. Together with other recent studies these findings advocate for a stability of the EMREO time course towards multiple experimental modulations and fuel speculations that the signal may serve as a temporal reference frame when combining signals across the senses.
Augsten, M.-L.; Lindenbeck, M. J.; Laback, B.
Show abstract
Cochlear implant (CI) users typically experience difficulties perceiving musical harmony due to a restricted spectro-temporal resolution at the electrode-nerve interface, resulting in limited pitch perception. We investigated how stimulus parameters affect discrimination of complex-tone triads (three-voice chords), aiming to identify conditions that maximize perceptual sensitivity. Six post-lingually deafened CI listeners completed a same/different task with harmonic complex tones, while spectral complexity, voice(s) containing a pitch change, and temporal synchrony (simultaneous vs. sequential triad presentation) were manipulated. CI listeners discriminated harmonically relevant one-semitone pitch changes within triads when spectral complexity was reduced to three or five components per voice, with significantly better performance for three-component compared to nine-component tones. Sensitivity was observed for pitch changes in the high voice or in both high and low voices, but not for changes in only the low voice. Single-voice sensitivity predicted simultaneous-triad sensitivity when controlling for spectral complexity and voice with pitch change. Contrary to expectations, sequential triad presentation did not improve discrimination. An analysis of processor pulse patterns suggests that difference-frequency cues encoded in the temporal envelope rather than place-of-excitation cues underlie perceptual triad sensitivity. These findings support reducing spectral complexity to enhance chord discrimination for CI users based on temporal cues.
Garcia Ruiz, T.; Sanes, D. H.
Show abstract
Many perceptual skills improve with a few days of training. However, weeks or months of practice may be required to reach a level of expertise on complex tasks (Watson, 1980). Here, we explored how gerbils attain expertise on a difficult task: amplitude modulation (AM) rate discrimination at very shallow AM depths, similar to the depths used during vocal communication. Using an appetitive Go-Nogo procedure, we first trained 6 gerbils to perform an AM discrimination task (Nogo: 4 Hz; Go: 4.25-10 Hz) at a depth of 0 dB (re: 100% depth). Animals were then trained to perform AM discrimination at successively shallower depths, from -3 to -18 dB, requiring an average of 5-10 days of practice to reach a performance metric of d[≥]1 for each depth. Finally, we determined that AM discrimination thresholds were nearly identical between 0 to -12 dB, and only slightly elevated at -15 dB. Improvements in performance were accompanied by a large reduction in response time during procedural learning, and a gradual reduction of response time during perceptual learning, even as AM depth became shallower (i.e., more difficult). The shallowest depth at which gerbils displayed peak performance on the AM discrimination task is similar to their lowest AM depth detection thresholds. These results suggest performance on challenging auditory perceptual tasks require prolonged practice, and is accompanied by increased automaticity (i.e., lower response time) that stabilizes once expertise is achieved.
Sivaprakasam, A.; Schweinzger, I.; Heinz, M.
Show abstract
Aging and noise over-exposure lead to complex mixtures of cochlear degradation that impair the structure and function of outer hair cells, inner hair cells (IHCs), and the cochlear nerve. However, IHC damage and cochlear synaptopathy (CS) remain pathologies "hidden" from the audiogram. This study aimed to identify and differentiate the physiological signatures of these two distinct pathologies using promising non-invasive assays: Envelope Following Responses (EFRs), Auditory Brainstem Response (ABRs), Wideband middle-ear reflexes (WB-MEMRs), and Distortion Product Otoacoustic Emissions (DPOAEs). We utilized chinchilla models of carboplatin-induced (CA) IHC damage (N = 4) and temporary threshold shift (TTS) noise-induced CS (N = 4) to compare the physiological signatures of each pathology. While both groups showed unchanged ABR thresholds two weeks after exposure, EFRs, ABR Wave V/I ratios, and MEMRs showed distinct effects of exposure. Despite non-elevated ABR-derived audiometric thresholds after exposure, both CA and TTS exposure resulted in severe in EFR "peakiness", particularly for sharp, short-duty-cycle stimuli and significant elevations in ABR Wave V/I ratios. However, these findings were less-pronounced in the TTS-exposed animals. WB-MEMR amplitudes were decreased with elevated thresholds in both groups; this effect was more pronounced in the TTS group. Opposite trends in DPOAE amplitudes indicated that while both IHC damage and CS result in similar suprathreshold temporal coding deficits, effects on outer-hair-cell integrity and auditory efferent physiology may differ between the two pathologies. Future work and novel diagnostics should aim to distinguish these specific cochlear pathologies in clinical populations, or at the very least consider their overlap. HighlightsO_LIA multi-metric diagnostic approach was used with chinchilla models of inner-hair-cell (IHC) damage and cochlear synaptopathy (CS). C_LIO_LIIHC damage and synaptopathy both cause suprathreshold deficits "hidden" from the audiogram. C_LIO_LIIHC damage results in more severe temporal envelope coding degradation than does synaptopathy. C_LIO_LIA combination of EFR "peakiness", ABR Wave V/I ratio, and Wideband Middle Ear Muscle Reflex (WB-MEMR) appear to be useful measures for profiling IHC damage and CS. C_LI
Zogby, D. S.; Eddington, V. M.; Craig, E. C.; Kloepper, L. N.
Show abstract
Common terns (Sterna hirundo) are regionally threatened migratory seabirds that form large breeding colonies during the North American summer months. They are highly vocal and serve as important bioindicators of aquatic ecosystems. Historically, acoustic studies on colonial seabirds have proven difficult due to the dense aggregations of individuals and high rate of call overlap. However, as passive acoustic monitoring (PAM) becomes increasingly common for studying seabird colonies, quantitative descriptions of species vocalizations are needed to accurately interpret behavioral information from colony soundscapes and support automated analysis of large acoustic datasets. This study aims to quantify the vocal repertoire of adult common terns. We deployed AudioMoths to collect acoustic data at a tern colony on Seavey Island, New Hampshire, USA from across the breeding season. Using RavenPro, unique call types were identified through visual and aural inspection of the acoustic data in the spectrogram. For each call, we then extracted measurements of peak frequency (Hz), bandwidth 90% (Hz), syllable duration 90% (s), and total bout duration (s) to quantify the characteristics of each call type. Statistical analyses for acoustic parameters by call type were performed using Kruskal-Wallis tests, followed by post-hoc Dunn tests. Our results demonstrate that each call type is significantly different from another by at least one parameter, with the exception of the kek and kip/tjuk calls. These findings present the first quantitative analysis of common tern vocalizations for North America. By defining temporal and spectral characteristics for multiple call types, this work helps translate colony soundscape into biologically meaningful information about tern behavior and colony dynamics. These descriptions also provide key parameters for developing automated tools to detect and classify vocalizations in dense, noisy colonies. Integrating quantified vocal characteristics with PAM offers a promising approach for monitoring colony activity and behavior while minimizing disturbance relative to traditional methods.
Alavilli, S.; McDermott, J. H.
Show abstract
Our ability to recognize sound sources in the world is critical to daily life, but is not well documented or understood in computational terms. We developed a large-scale behavioral benchmark of human environmental sound recognition, built stimulus-computable models of sound recognition, and used the benchmark to compare models to humans. The behavioral benchmark measured how sound recognition varied across source categories, audio distortions, and concurrent sound sources, all of which influenced recognition performance in humans. Artificial neural network models trained to recognize sounds in multi-source scenes reached near-human accuracy and qualitatively matched human patterns of performance in many conditions. By contrast, traditional models of the cochlea and auditory cortex that were trained to recognize sounds produced worse matches to human performance. Models trained on larger datasets exhibited stronger alignment with both human behavior and brain responses. The results suggest that many aspects of human sound recognition emerge in systems optimized for the problem of real-world recognition. The benchmark results set the stage for future explorations of auditory scene perception involving salience and attention.
Figarola, V.; Liang, W.; Luthra, S.; Parker, E.; Winn, M.; Brown, C.; Shinn-Cunningham, B. G.
Show abstract
Listeners face many challenges when trying to maintain attention to a target source in everyday settings; for instance, reverberation distorts acoustic cues and interruptions capture attention. However, little is known about how these challenges affect the ability to maintain selective attention. Here, we measured syllable recall accuracy and pupil dilation during a spatial selective attention task that was sometimes disrupted. Participants heard two competing, temporally interleaved syllable streams presented in pseudo-anechoic or reverberant environments. On randomly selected trials, a sudden interruption occurred mid-sequence. Compared to anechoic trials, reverberant performance was worse overall, and the interrupter disrupted performance. In uninterrupted trials, reverberation reduced peak pupil dilation both when it was consistent across all stimuli in a block and when it was randomized trial to trial, suggesting temporal smearing reduced clarity of the scene and the salience of events in the ongoing streams. Pupil dilations in response to interruptions indicated perceptual salience was strong across reverberant and anechoic conditions. Specifically, baseline pupil size before trials did not vary across room conditions, and mixing or blocking of trials (altering stimulus expectations) had no impact on pupillary responses. Together, these findings highlight that stimulus salience drives cognitive load more strongly than does task performance.
Colak, H.; Benzaquen, E.; Guo, X.; Lad, M.; Sedley, W.; Griffiths, T. D.
Show abstract
Understanding speech in noisy environments (SPIN) is an important everyday ability, and engaging in musical activities has been proposed as a factor that may support this ability. However, the cognitive mechanisms underlying a potential musical advantage in SPIN perception remain unclear. Here we investigated whether musical sophistication is associated with better SPIN perception in a large population-based sample, and whether this relationship is mediated by auditory working memory (AWM), verbal working memory (VWM), or non-verbal intelligence. We recruited 203 participants and measured SPIN perception at both word and sentence levels. Musical sophistication was assessed using the Goldsmiths Musical Sophistication Index (Gold-MSI). AWM was measured using delayed matching of tone frequency or the modulation rate of amplitude modulated white noise, VWM was based on backward digit span task, and non-verbal intelligence used matrix reasoning. Mediation analyses revealed that AWM fully mediated the relationship between musical sophistication and SPIN perception, whereas VWM showed no mediation effect. Non-verbal intelligence showed a partial mediating effect. Additional control analyses using structural equation modelling revealed that the indirect effect through AWM remained significant after accounting for age, hearing thresholds, and non-verbal intelligence. Together, these findings suggest that individuals with greater musical sophistication demonstrate better daily life listening abilities, and that superior auditory working memory may be the key cognitive mechanism underlying this advantage.
King, C. D.; Zhu, T.; Groh, J. M.
Show abstract
Information about eye movements is necessary for linking auditory and visual information across space. Recent work has suggested that such signals are incorporated into processing at the level of the ear itself (Gruters, Murphy et al. 2018). Here we report confirmation that the eye movement signals that reach the ear can produce perceptual consequences, via a case report of an unusual participant with tensor tympani myoclonus who hears sounds when she moves her eyes. The sounds she hears could be recorded with a microphone in the ear in which she hears them (left), and occurred for large leftward eye movements to extreme orbital positions of the eyes. The sounds elicited by this participants eye movements were reminiscent of eye movement-related eardrum oscillations (EMREOs, (Gruters, Murphy et al. 2018, Brohl and Kayser 2023, King, Lovich et al. 2023, Lovich, King et al. 2023, Lovich, King et al. 2023, Abbasi, King et al. 2025, Sotero Silva, Kayser et al. 2025, King and Groh 2026, Leon, Ramos et al. 2026, Sotero Silva, Brohl et al. 2026)), but were larger and longer lasting than classical EMREOs, helping to explain why they were audible to her. Overall, the observations from this patient help establish that (a) eye movement-related signals specifically reach the tensor tympani muscle and that (b) when there is an abnormality involving that muscle, such signals can lead to actual audible percepts. Given that the tensor tympani contributes to the regulation of sound transmission in the middle ear, these findings support that eye movement signals reaching the ear have functional consequences for auditory perception. The findings also expand the types of medical conditions that produce gaze-evoked tinnitus, to date most commonly observed in connection with acoustic neuromas.
Stowell, D.; Nolasco, I.; McEwen, B.; Vidana Vila, E.; Jean-Labadye, L.; Benhamadi, Y.; Lostanlen, V.; Dubus, G.; Hoffman, B.; Linhart, P.; Morandi, I.; Cazau, D.; White, E.; White, P.; Miller, B.; Nguyen Hong Duc, P.; Schall, E.; Parcerisas, C.; Gros-Martial, A.; Moummad, I.
Show abstract
Computational bioacoustics has seen significant advances in recent decades. However, the rate of insights from automated analysis of bioacoustic audio lags behind our rate of collecting the data - due to key capacity constraints in data annotation and bioacoustic algorithm development. Gaps in analysis methodology persist: not because they are intractable, but because of resource limitations in the bioacoustics community. To bridge these gaps, we advocate the open science method of data challenges, structured as public contests. We conducted a bioacoustics data challenge named BioDCASE, within the format of an existing event (DCASE). In this work we report on the procedures needed to select and then conduct useful bioacoustics data challenges. We consider aspects of task design such as dataset curation, annotation, and evaluation metrics. We report the three tasks included in BioDCASE 2025 and the resulting progress made. Based on this we make recommendations for open community initiatives in computational bioacoustics.
Manasevich, V.; Kostanian, D.; Rogachev, A.; Sysoeva, O.
Show abstract
Rise time (RT) is considered to be one of the most significant acoustical characteristics of auditory speech stimuli. A substantial amount of data has been accumulated on the neurophysiological mechanisms of RT processing under different conditions and in different groups of people, but these data have not been systematised. This review focuses on studies that have investigated electroencephalographic (EEG) markers of RT sensitivity. The present literature search was conducted according to the PRISMA statement in PubMed, Web of Science and APA PsychInfo databases. The resultant review comprised 37 studies that considered diverse aspects of RT processing. The review describes the main stimulation parameters affecting electrophysiological markers of RT processing reflected in different components of event-related potentials, brainstem responses and cortical rhythmic activity. The main finding of this review is that the rise time prolongation leads to a decrease in the amplitude of the main ERP components and an increase in their latencies. However, the sensitivity of the EEG markers varied with the earliest components tracking the subtle difference (few tens of microseconds), while the later components coding the larger one (up to 500 ms). Nevertheless, the observed effects may vary and depend on some aspects of the experimental paradigm, age of participants and speech-related problems. Future research may benefit by addressing understudied clinical groups and ERP components such as P1 and N2, dominated in children.
Al-Naji, A.; Schubotz, R. I.; Zahedi, A.
Show abstract
Research in cognitive neuroscience has relied on simple, highly controlled stimuli due to the difficulty in developing standardized, ecologically valid stimulus sets. However, there is a consensus that using ecologically valid stimuli is imperative to generalize results beyond controlled laboratory settings. The current study introduces a naturalistic audio stimulus database, consisting of short, recognizable, and emotionally rated stimuli. To create such a database, the current study collected 291 audio files from a wide range of sources. 361 participants rated the audio clips on emotionality, arousal, and recognizability, and subsequently freely described the audios by typing what they believed the sound to be. The text responses of the participants were embedded and clustered using an unsupervised machine-learning algorithm to derive a participant-grounded organization of auditory object categories. The results indicate audio clips were easily recognizable, while emotionality and arousal ratings showed broad variability, making the database suitable for diverse experimental needs. Furthermore, the final database comprises 10 distinct semantic categories, providing a diverse set of auditory stimuli.
Eccher, E.; Salva, O. R.; Chiandetti, C.; Vallortigara, G.
Show abstract
Numerical abilities are widespread in the animal kingdom and are not exclusive to humans. Domestic chicks (Gallus gallus) have been shown to discriminate numerosities spontaneously, but prior research has focused exclusively on the visual modality. Whether chicks can discriminate numerical information in the auditory domain remains unknown, despite evidence that they can perceive other auditory features such as tone and rhythm. In this study, we investigated spontaneous numerical discrimination in the auditory modality in naive domestic chicks. In Experiment 1, newly-hatched chicks were tested for their ability to discriminate between two auditory sequences differing in numerosity (4 vs. 12 identical sounds), with and without controlling for continuous variables such as duration and total sound amount. Experiment 2 examined chicks filial imprinting responses to familiar or unfamiliar numerosities. Experiment 3 controlled for potential spontaneous preferences for a single longer sound versus a shorter one. Our results showed a preference for the 12-sound sequence only when duration and total sound amount were not matched. When these continuous variables were controlled, no spontaneous numerical preference emerged. Experiment 2 revealed an overall preference for the 12-sound sequence regardless of imprinting conditions, while Experiment 3 confirmed that chicks do not have an inherent preference for longer sounds. These findings suggest that chicks are sensitive to overall magnitude in the auditory domain but do not spontaneously discriminate numerical differences when other continuous variables are held constant. Future studies will explore how specific stimulus features, such as heterogeneity of sounds, influence these preferences.
Rocchi, F.; Haukes, N. C.; van Opstal, A. J.; van Wanrooij, M. M.
Show abstract
AO_SCPLOWBSTRACTC_SCPLOWVision can shape auditory perception, especially when visual cues occur at different times and locations than sounds. Simultaneous but spatially misaligned lights bias the perceived location of a sound--a phenomenon known as the ventriloquism effect. Temporally misaligned lights can also affect the latency of auditory responses. However, it remains unclear how multiple visual stimuli that differ from sounds in both space and time jointly influence localization behaviour. We investigated how visual distractors, spatially misaligned by 10{degrees}, presented before and/or during a target sound influence localization accuracy and response latency in a rapid head-pointing task. Human listeners localized brief (150 ms) broadband noise bursts with an average root-mean-square error of 5{degrees} and a baseline latency of 252 ms. Simultaneous visual cues induced the ventriloquism effect, in which the perceived sound location was biased by 1.8{degrees}. Response latency also increased by 21 ms (273 ms). Preceding visual stimuli (2 s duration) did not induce a bias, but increased latency by 55 ms (307 ms). Introducing a 200 ms gap between the preceding light and the sound reduced this latency increase to 24 ms (276 ms), still not inducing a significant bias. When we presented both a preceding and a simultaneous light on opposite sides of the sound, localization reflected the bias induced by the simultaneous light (1.8{degrees}) and the latency increase induced by the preceding light (by 48 ms). These findings reveal a dissociation in audiovisual integration: preceding visual stimuli primarily influence when a sound is responded to (latency), while simultaneous stimuli influence where it is perceived (accuracy). This supports causal inference models of multisensory integration and suggests distinct underlying mechanisms for spatial and temporal processing of sounds in sensorimotor circuits.
Kamau, A. F.; Merchant, G. R.; Nakajima, H. H.; Neely, S. T.
Show abstract
Conductive hearing loss (CHL) with a normal otoscopic exam can be difficult to diagnose because routine clinical measures such as audiometric air-bone gaps (ABGs) can identify a conductive component but often cannot distinguish among specific underlying mechanical pathologies (e.g., stapes fixation versus superior canal dehiscence, which may produce similar audiograms). Wideband tympanometry (WBT) is a fast, noninvasive test that can provide additional mechanical information across a broad range of frequencies (200 Hz to 8 kHz). However, WBT metrics are influenced by variations in ear canal geometry and probe placement and can be challenging to interpret clinically. In this study, we extend prior WBT absorbance-based classification work by estimating the middle ear input impedance at the tympanic membrane (ZME), a WBT-derived metric intended to reduce ear canal effects. To estimate ZME, we fit an analog circuit model of the ear canal, middle ear, and inner ear to raw WBT data collected at tympanometric peak pressure (TPP). Data from 27 normal ears, 32 ears with superior canal dehiscence, and 38 ears with stapes fixation were analyzed. A multinomial logistic regression classifier was trained using principal component analysis (retaining 90% variance) and stratified 5-fold cross-validation with regularization. We compared feature sets based on ABGs alone, ABGs combined with absorbance, and ABGs combined with the magnitude of ZME. The combination of ABGs and the magnitude of ZME produced the best performance, achieving an overall accuracy of 85.6% compared to 80.4% for ABGs alone and 78.4% for ABGs combined with absorbance. These results suggest that incorporating model-derived middle ear impedance features with standard audiometric measures (ABGs) can improve automated pathology classification for stapes fixation and superior canal dehiscence.
Fischer, B. J.; Syeda, R. F.; Pena, J. L.
Show abstract
The cross-correlation model has long served as the standard computational framework for describing interaural time difference (ITD) processing in the barn owls auditory system. While successful in explaining initial sinusoidal responses at the site of coincidence detection in the nucleus laminaris, this previous standard model fails to capture the full diversity of ITD tuning observed in the inferior colliculus (IC), where neurons exhibit sharper-than-sinusoidal ITD tuning, nonlinear frequency integration, level-dependent gain control, and interaural level difference (ILD)-dependent modulation of ITD selectivity. Here we present a modified cross-correlation model that addresses these limitations through the addition of parameterized gain control, linear filters with inhibitory surround structure, static nonlinearities, and ILD-dependent modulation of the cross-correlation computation. We show that divisive gain control produces realistic rate-level functions, including non-monotonic responses. Furthermore, inhibitory weights in the linear filter, combined with a threshold or expansive nonlinearity, generate sharper-than-sinusoidal ITD tuning consistent with experimental observations. This model reproduces both linear and nonlinear two-tone frequency integration and demonstrates that independent variation of filter bandwidth and nonlinearity shape accounts for the experimentally observed lack of correlation between side-peak suppression and frequency tuning width across the neuronal population. In addition, ILD-dependent modifications to the model produce shifts in best ITD and reductions in ITD tuning strength, as observed in the lateral shell of the central nucleus of the IC. The model parameters can be efficiently determined using simulation-based inference, enabling generation of realistic neuronal populations. Thus, this flexible, analytically tractable framework provides a foundation for investigating population coding of auditory space in the owls midbrain.
Harlow, T. J.; Korsu, H.; Almotwaly, L.; Corcoran, M. I.; Cole, S.; Chrobak, J. J.; Read, H. L.
Show abstract
Lateralized alpha-band oscillations are thought to reflect distractor inhibition through suppression of cortical regions processing spatially-localized distracting stimuli. Standard lateralization indices (LI) quantify hemispheric asymmetries in spectral power over large time windows, while characterizations of the time-varying dynamics of alpha-mediated distractor inhibition are lacking. Here we evaluate comparison scanning generalized eigendecomposition (csGED), a multivariate signal processing technique, for its efficacy in addressing questions related to alpha-mediated distractor inhibition using a simulation of same-frequency sources at symmetric cortical locations adapted from Zuure and Cohen (2021). We show that while LI accurately captures topographic power asymmetries, csGED is effective at recovering source-projections for bilateral, same-frequency activity across a wide range of signal-to-noise ratios (SNRs). We further extend these models to a pilot sample (N = 11) performing a speech-in-noise task using spatialized naturalistic distractors through individualized head-related transfer functions (HRTFs). Our results demonstrate the efficacy of GED to characterize source projections during spatialized distractors, and provide preliminary evidence for shifts in oscillatory activity in both the alpha (7 - 13 Hz) and beta frequency ranges (15 - 25 Hz) during spatialized speech-in-noise tasks. Together, these results demonstrate the feasibility of csGED for investigating temporal dynamics of lateralized distractor inhibition and motivate larger confirmatory studies.